Skip to content

fix(onboard): increase endpoint probe timeout for large model inference#1080

Closed
ericksoa wants to merge 1 commit intomainfrom
fix/onboard-probe-timeout
Closed

fix(onboard): increase endpoint probe timeout for large model inference#1080
ericksoa wants to merge 1 commit intomainfrom
fix/onboard-probe-timeout

Conversation

@ericksoa
Copy link
Copy Markdown
Contributor

@ericksoa ericksoa commented Mar 30, 2026

Summary

  • Increase --connect-timeout from 5s to 10s and --max-time from 20s to 60s in getCurlTimingArgs()
  • The onboard endpoint probe sends a full inference request to validate the provider. 20s was too tight for nvidia/nemotron-3-super-120b-a12b on NVIDIA Endpoints, causing a curl timeout (exit 28) and non-interactive onboard failure.
  • This probe was added in feat: expand provider onboarding and validation #648 (March 24) but has never passed in the nightly e2e — the e2e has been broken since March 23 for unrelated reasons, so this code path was never validated in CI.
  • Only runs once during onboard, so the longer timeout has no UX impact.

Test plan

Summary by CodeRabbit

  • Bug Fixes

    • Increased network probe timeout values to improve reliability during network operations.
  • Tests

    • Updated test assertions to reflect network probe timeout changes.

The onboard endpoint validation sends a full inference request to verify
the provider is reachable. The 20s max-time was too tight for large
models like nemotron-3-super-120b-a12b on NVIDIA Endpoints, causing
the probe to time out and onboard to fail in non-interactive mode.

Increase connect-timeout from 5s to 10s and max-time from 20s to 60s.
This only runs once during onboard, so the longer timeout is acceptable.

This probe was added in #648 (March 24) but has never run successfully
in the nightly e2e because the e2e has been broken since March 23.
@ericksoa ericksoa requested a review from cv March 30, 2026 02:17
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 542713b2-8fcd-484a-b23d-a0e9bacd0f45

📥 Commits

Reviewing files that changed from the base of the PR and between a146385 and 7875f3e.

📒 Files selected for processing (2)
  • bin/lib/onboard.js
  • test/credential-exposure.test.js

📝 Walkthrough

Walkthrough

Updated curl timeout parameters in the network probe utility from 5 to 10 seconds for connection timeout and 20 to 60 seconds for maximum time. Corresponding test assertions were updated to match the new timeout values.

Changes

Cohort / File(s) Summary
Curl Timeout Configuration
bin/lib/onboard.js, test/credential-exposure.test.js
Increased curl timing arguments in network probe from --connect-timeout 5 and --max-time 20 to --connect-timeout 10 and --max-time 60. Updated test assertions to verify the new timeout values.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

🐰 Whiskers twitching with delight,
Timeouts doubled, oh what sight!
Ten and sixty now take flight,
Network probes sleep through the night!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: increasing endpoint probe timeouts in the onboard function for better compatibility with large model inference.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/onboard-probe-timeout

Comment @coderabbitai help to get the list of available commands and usage tips.

@ericksoa ericksoa marked this pull request as draft March 30, 2026 02:18
@ericksoa ericksoa closed this Mar 30, 2026
@ericksoa ericksoa deleted the fix/onboard-probe-timeout branch March 30, 2026 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant